Unsupervised Query Segmentation Using Monolingual Word Alignment Method

نویسندگان

  • Dayong Wu
  • Yu Zhang
  • Ting Liu
چکیده

In this paper, we propose a novel unsupervised approach to query segmentation using the word alignment model which is usually adopted in statistical machine translation system. Query segmentation is to obtain complete phrases or concepts in a query by segmenting a sequence of query terms, which is an important query processing procedure for improving information retrieval performance in search engines. In this work, we use a novel monolingual word alignment method to segment queries and automatically obtain the query structure in the form of multilevel segmentation. Our approach is language independent and unsupervised so that it is easy to be applied to various language scenarios. Experimental results on a real-world query dataset show that our approach outperforms the state of the art language model based method, which demonstrates the effectiveness of the proposed approach in query segmentation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonparametric Word Segmentation for Machine Translation

We present an unsupervised word segmentation model for machine translation. The model uses existing monolingual segmentation techniques and models the joint distribution over source sentence segmentations and alignments to the target sentence. During inference, the monolingual segmentation model and the bilingual word alignment model are coupled so that the alignments to the target sentence gui...

متن کامل

Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

This paper describes an unsupervised dynamic graphical model for morphological segmentation and bilingual morpheme alignment for statistical machine translation. The model extends Hidden Semi-Markov chain models by using factored output nodes and special structures for its conditional probability distributions. It relies on morpho-syntactic and lexical source-side information (part-of-speech, m...

متن کامل

Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora

Unsupervised word segmentation (UWS) can provide domain-adaptive segmentation for statistical machine translation (SMT) without annotated data, and bilingual UWS can even optimize segmentation for alignment. Monolingual UWS approaches of explicitly modeling the probabilities of words through Dirichlet process (DP) models or Pitman-Yor process (PYP) models have achieved high accuracy, but their ...

متن کامل

Improving Word Alignment by Adjusting Chinese Word Segmentation

Most of the current Chinese word alignment tasks often adopt word segmentation systems firstly to identify words. However, word-mismatching problems exist between languages and will degrade the performance of word alignment. In this paper, we propose two unsupervised methods to adjust word segmentation to make the tokens 1-to-1 mapping as many as possible between the corresponding sentences. Th...

متن کامل

Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

Most existing methods for automatic bilingual dictionary induction rely on prior alignments between the source and target languages, such as parallel corpora or seed dictionaries. For many language pairs, such supervised alignments are not readily available. We propose an unsupervised approach for learning a bilingual dictionary for a pair of languages given their independently-learned monoling...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer and Information Science

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2012